Recognizing Hand-written Digits Using Hierarchical Products of Experts

نویسندگان

  • Guy Mayraz
  • Geoffrey E. Hinton
چکیده

The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a non-linear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different class-specific models. To improve discriminative performance, it is helpful to learn a hierarchy of separate models for each digit class. Each model in the hierarchy has one layer of hidden units and the nth level model is trained on data that consists of the activities of the hidden units in the already trained (n 1)th level model. After training, each level produces a separate, unnormalized log probabilty score. With a three-level hierarchy for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logistic classification network that is trained on separate data. On the MNIST database, our system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data. 1 Learning products of stochastic binary experts Hinton [1] describes a learning algorithm for probabilistic generative models that are composed of a number of experts. Each expert specifies a probability distribution over the visible variables and the experts are combined by multiplying these distributions together and renormalizing. p(dj 1::: n) = mpm(dj m) P mpm( j m) (1) where d is a data vector in a discrete space, m is all the parameters of individual model m, pm(dj m) is the probability of d under model m, and is an index over all possible vectors in the data space. A Restricted Boltzmann machine [2, 3] is a special case of a product of experts in which each expert is a single, binary stochastic hidden unit that has symmetrical connections to a set of visible units, and connections between the hidden units are forbidden. Inference in an RBM is much easier than in a general Boltzmann machine and it is also much easier than in a causal belief net because there is no explaining away. There is therefore no need to perform any iteration to determine the activities of the hidden units. The hidden states, sj , are conditionally independent given the visible states, si, and the distribution of sj is given by the standard logistic function: p(sj = 1) = 1 1 + exp( Pi wijsi) (2) Conversely, the hidden states of an RBM are marginally dependent so it is easy for an RBM to learn population codes in which units may be highly correlated. It is hard to do this in causal belief nets with one hidden layer because the generative model of a causal belief net assumes marginal independence. An RBM can be trained using the standard Boltzmann machine learning algorithm which follows a noisy but unbiased estimate of the gradient of the log likelihood of the data. One way to implement this algorithm is to start the network with a data vector on the visible units and then to alternate between updating all of the hidden units in parallel and updating all of the visible units in parallel. Each update picks a binary state for a unit from its posterior distribution given the current states of all the units in the other set. If this alternating Gibbs sampling is run to equilibrium, there is a very simple way to update the weights so as to minimize the Kullback-Leibler divergence, Q0jjQ1, between the data distribution, Q0, and the equilibrium distribution of fantasies over the visible units, Q1, produced by the RBM [4]: wij / Q0 Q1 (3) where Q0 is the expected value of sisj when data is clamped on the visible units and the hidden states are sampled from their conditional distribution given the data, and Q1 is the expected value of sisj after prolonged Gibbs sampling. This learning rule does not work well because it can take a long time to approach thermal equilibrium and the sampling noise in the estimate of Q1 can swamp the gradient. [1] shows that it is far more effective to minimize the difference between Q0jjQ1 and Q1jjQ1 where Q1 is the distribution of the one-step reconstructions of the data that are produced by first picking binary hidden states from their conditional distribution given the data and then picking binary visible states from their conditional distribution given the hidden states. The exact gradient of this “contrastive divergence” is complicated because the distributionQ1 depends on the weights, but [1] shows that this dependence can safely be ignored to yield a simple and effective learning rule for following the approximate gradient of the contrastive divergence: wij / Q0 Q1 (4) For images of digits, it is possible to apply Eq. 4 directly if we use stochastic binary pixel intensities, but it is more effective to normalize the intensities to lie in the range [0; 1℄ and then to use these real values as the inputs to the hidden units. During reconstruction, the stochastic binary pixel intensities required by Eq. 4 are also replaced by real-valued probabilities. Finally, the learning rule can be made less noisy by replacing the stochastic binary activities of the hidden units by their expected values. So the learning rule we actually use is: wij / Q0 Q1 (5) Stochastically chosen binary states of the hidden units are still used for computing the probabilities of the reconstructed pixels. This prevents each real-valued hidden probability from conveying more than 1 bit of information to the reconstruction. 2 The MNIST database MNIST, a standard database for testing digit recognition algorithms, is available at http://www.research.att.com/ yann/ocr/mnist/index.html. MNIST

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Persian handwritten digits using Characterization Loci and Mixture of Experts

A method for recognition of Persian handwritten digits based on characterization loci and mixture of experts is proposed. This method utilizes the characterization loci, as the main feature. In the classification stage of our proposed method the mixture of experts are applied. This recognition method is applied to Farsi hand-written digits in the HODA database. The experimental results support ...

متن کامل

Comparing the Machine Ability to Recognize Hand-Written Hindu and Arabic Digits

The main aim of this work is to compare Hindu and Arabic digits with respect to a machine’s ability to recognize them. This comparison is done on the raw representation (images) of the digits and on their features extracted using two feature selection methods. Three learning algorithms with different inductive biases were used in the comparison performed using the raw representation; two of the...

متن کامل

Brain Decoding-Classification of Hand Written Digits from fMRI Data Employing Bayesian Networks

We are frequently exposed to hand written digits 0-9 in today's modern life. Success in decoding-classification of hand written digits helps us understand the corresponding brain mechanisms and processes and assists seriously in designing more efficient brain-computer interfaces. However, all digits belong to the same semantic category and similarity in appearance of hand written digits makes t...

متن کامل

Recognizing Hand-Printed Digits with a Distance Quasi-Metric

A distance quasi-metric for pattern recognition is presented. The “quasi” modifier distinguishes the metric from “true” distance metrics which obey a set of standard constraints. By relaxing one of the constraints, and coupling it with a fast multi-dimensional search technique, the metric demonstrates improved accuracy and efficiency compared to other metrics in recognizing hand-written digit s...

متن کامل

Project CellNet: Evolving an Autonomous Pattern Recogniser

We describe the desire for a black box approach to pattern classification: a generic Autonomous Pattern Recognizer, which is capable of self-adapting to specific alphabets without human intervention. The CellNet software system is introduced, an evolutionary system that optimizes a set of pattern-recognizing agents relative to a provided set of features and a given pattern database. CellNet uti...

متن کامل

Reading Digits in Natural Images with Unsupervised Feature Learning

Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2000